Mining Top-K Largest Tiles in a Data Stream
نویسندگان
چکیده
Large tiles in a database are itemsets with the largest area which is defined as the itemset frequency in the database multiplied by its size. Mining these large tiles is an important pattern mining problem since tiles with a large area describe a large part of the database. In this paper, we introduce the problem of mining top-k largest tiles in a data stream under the sliding window model. We propose a candidatebased approach which summarizes the data stream and produces the top-k largest tiles efficiently for moderate window size. We also propose an approximation algorithm with theoretical bounds on the error rate to cope with large size windows. In the experiments with two real-life datasets, the approximation algorithm is up to hundred times faster than the candidate-based solution and the baseline algorithms based on the state-of-the-art solutions. We also investigate an application of large tile mining in computer vision and in emerging search topics monitoring.
منابع مشابه
Analysis of Videos using Tile Mining
We investigate how mining top-k largest tiles in a data stream under the sliding window model can be useful for (real-time) analysis of videos and, in particular, for tracking. We first explain how a tracking problem can be cast into a stream pattern mining problem. We then show some preliminary results on tracking in the particular context where both the objects and the camera are moving and w...
متن کاملMining Top-K Path Traversal Patterns over Streaming Web Click-Sequences
Online, one-pass mining Web click streams poses some interesting computational issues, such as unbounded length of streaming data, possibly very fast arrival rate, and just one scan over previously arrived Web click-sequences. In this paper, we propose a new, single-pass algorithm, called DSM-TKP (Data Stream Mining for Top-K Path traversal patterns), for mining a set of top-k path traversal pa...
متن کاملMining Top-K Click Stream Sequences Patterns
Sequential pattern mining, it is not just important in data mining field, but it is the basis of many applications. However, running applications cost time and memory, especially when dealing with dense of the dataset. Setting the proper minimum support threshold is one of the factors that consume more memory and time. However, it is difficult for users to get the appropriate patterns; it may p...
متن کاملTop-k-FCI: Mining Top-K Frequent Closed Itemsets in Data Streams
With the generation and analysis of stream data, such as network monitoring in real time, log records, click streams, a great deal of attention has been concerned on data streams mining in the field of data mining. In the process of the data streams mining, it is more reasonable to ask users to set a bound on the result size. Therefore, in this paper, an real-time single-pass algorithm, called ...
متن کاملEnsemble-based Top-k Recommender System Considering Incomplete Data
Recommender systems have been widely used in e-commerce applications. They are a subclass of information filtering system, used to either predict whether a user will prefer an item (prediction problem) or identify a set of k items that will be user-interest (Top-k recommendation problem). Demanding sufficient ratings to make robust predictions and suggesting qualified recommendations are two si...
متن کامل